{
 "cells": [
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "---\n",
    "hide_table_of_contents: true\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Custom vectorstores\n",
    "\n",
    "If you want to interact with a vectorstore that is not already present as an [integration](/docs/integrations/vectorstores), you can extend the [`VectorStore` class](https://api.js.langchain.com/classes/langchain_core_vectorstores.VectorStore.html).\n",
    "\n",
    "This involves overriding a few methods:\n",
    "\n",
    "- `FilterType`, if your vectorstore supports filtering by metadata, you should declare the type of the filter required.\n",
    "- `addDocuments`, which embeds and adds LangChain documents to storage. This is a convenience method that should generally use the `embeddings` passed into the constructor to embed the document content, then call `addVectors`.\n",
    "- `addVectors`, which is responsible for saving embedded vectors, document content, and metadata to the backing store.\n",
    "- `similaritySearchVectorWithScore`, which searches for vectors within the store by similarity to an input vector, and returns a tuple of the most relevant documents and a score.\n",
    "- `_vectorstoreType`, which returns an identifying string for the class. Used for tracing and type-checking.\n",
    "- `fromTexts` and `fromDocuments`, which are convenience static methods for initializing a vectorstore from data.\n",
    "\n",
    "There are a few optional methods too:\n",
    "\n",
    "- `delete`, which deletes vectors and their associated metadata from the backing store based on arbitrary parameters.\n",
    "- `maxMarginalRelevanceSearch`, an alternative search mode that increases the number of retrieved vectors, reranks them to optimize for diversity, then returns top results. This can help reduce the amount of redundancy in returned results.\n",
    "\n",
    "A few notes:\n",
    "\n",
    "- Different databases provide varying levels of support for storing raw content/extra metadata fields. Some higher level retrieval abstractions like [multi-vector retrieval](/docs/modules/data_connection/retrievers/multi-vector-retriever) in LangChain rely on the ability to set arbitrary metadata on stored vectors.\n",
    "- Generally, search type arguments that are not used directly to filter returned vectors by associated metadata should be passed into the constructor.\n",
    "\n",
    "Here is an example of an in-memory vectorstore with no persistence that uses cosine distance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import { VectorStore } from \"@langchain/core/vectorstores\";\n",
    "import type { EmbeddingsInterface } from \"@langchain/core/embeddings\";\n",
    "import { Document } from \"@langchain/core/documents\";\n",
    "\n",
    "import { similarity as ml_distance_similarity } from \"ml-distance\";\n",
    "\n",
    "interface InMemoryVector {\n",
    "  content: string;\n",
    "  embedding: number[];\n",
    "  metadata: Record<string, any>;\n",
    "}\n",
    "\n",
    "export interface CustomVectorStoreArgs {}\n",
    "\n",
    "export class CustomVectorStore extends VectorStore {\n",
    "  declare FilterType: (doc: Document) => boolean;\n",
    "\n",
    "  memoryVectors: InMemoryVector[] = [];\n",
    "\n",
    "  _vectorstoreType(): string {\n",
    "    return \"custom\";\n",
    "  }\n",
    "\n",
    "  constructor(\n",
    "    embeddings: EmbeddingsInterface,\n",
    "    fields: CustomVectorStoreArgs = {}\n",
    "  ) {\n",
    "    super(embeddings, fields);\n",
    "  }\n",
    "\n",
    "  async addDocuments(documents: Document[]): Promise<void> {\n",
    "    const texts = documents.map(({ pageContent }) => pageContent);\n",
    "    return this.addVectors(\n",
    "      await this.embeddings.embedDocuments(texts),\n",
    "      documents\n",
    "    );\n",
    "  }\n",
    "\n",
    "  async addVectors(vectors: number[][], documents: Document[]): Promise<void> {\n",
    "    const memoryVectors = vectors.map((embedding, idx) => ({\n",
    "      content: documents[idx].pageContent,\n",
    "      embedding,\n",
    "      metadata: documents[idx].metadata,\n",
    "    }));\n",
    "\n",
    "    this.memoryVectors = this.memoryVectors.concat(memoryVectors);\n",
    "  }\n",
    "\n",
    "  async similaritySearchVectorWithScore(\n",
    "    query: number[],\n",
    "    k: number,\n",
    "    filter?: this[\"FilterType\"]\n",
    "  ): Promise<[Document, number][]> {\n",
    "    const filterFunction = (memoryVector: InMemoryVector) => {\n",
    "      if (!filter) {\n",
    "        return true;\n",
    "      }\n",
    "\n",
    "      const doc = new Document({\n",
    "        metadata: memoryVector.metadata,\n",
    "        pageContent: memoryVector.content,\n",
    "      });\n",
    "      return filter(doc);\n",
    "    };\n",
    "    const filteredMemoryVectors = this.memoryVectors.filter(filterFunction);\n",
    "    const searches = filteredMemoryVectors\n",
    "      .map((vector, index) => ({\n",
    "        similarity: ml_distance_similarity.cosine(query, vector.embedding),\n",
    "        index,\n",
    "      }))\n",
    "      .sort((a, b) => (a.similarity > b.similarity ? -1 : 0))\n",
    "      .slice(0, k);\n",
    "\n",
    "    const result: [Document, number][] = searches.map((search) => [\n",
    "      new Document({\n",
    "        metadata: filteredMemoryVectors[search.index].metadata,\n",
    "        pageContent: filteredMemoryVectors[search.index].content,\n",
    "      }),\n",
    "      search.similarity,\n",
    "    ]);\n",
    "\n",
    "    return result;\n",
    "  }\n",
    "\n",
    "  static async fromTexts(\n",
    "    texts: string[],\n",
    "    metadatas: object[] | object,\n",
    "    embeddings: EmbeddingsInterface,\n",
    "    dbConfig?: CustomVectorStoreArgs\n",
    "  ): Promise<CustomVectorStore> {\n",
    "    const docs: Document[] = [];\n",
    "    for (let i = 0; i < texts.length; i += 1) {\n",
    "      const metadata = Array.isArray(metadatas) ? metadatas[i] : metadatas;\n",
    "      const newDoc = new Document({\n",
    "        pageContent: texts[i],\n",
    "        metadata,\n",
    "      });\n",
    "      docs.push(newDoc);\n",
    "    }\n",
    "    return this.fromDocuments(docs, embeddings, dbConfig);\n",
    "  }\n",
    "\n",
    "  static async fromDocuments(\n",
    "    docs: Document[],\n",
    "    embeddings: EmbeddingsInterface,\n",
    "    dbConfig?: CustomVectorStoreArgs\n",
    "  ): Promise<CustomVectorStore> {\n",
    "    const instance = new this(embeddings, dbConfig);\n",
    "    await instance.addDocuments(docs);\n",
    "    return instance;\n",
    "  }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we can call this vectorstore directly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\n",
       "  Document {\n",
       "    pageContent: \u001b[32m\"Mitochondria is the powerhouse of the cell\"\u001b[39m,\n",
       "    metadata: { id: \u001b[33m1\u001b[39m }\n",
       "  },\n",
       "  Document {\n",
       "    pageContent: \u001b[32m\"Buildings are made of brick\"\u001b[39m,\n",
       "    metadata: { id: \u001b[33m2\u001b[39m }\n",
       "  }\n",
       "]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
    "import { Document } from \"@langchain/core/documents\";\n",
    "\n",
    "const vectorstore = new CustomVectorStore(new OpenAIEmbeddings());\n",
    "\n",
    "await vectorstore.addDocuments([\n",
    "  new Document({ pageContent: \"Mitochondria is the powerhouse of the cell\", metadata: { id: 1 } }),\n",
    "  new Document({ pageContent: \"Buildings are made of brick\", metadata: { id: 2 } }),\n",
    "]);\n",
    "\n",
    "await vectorstore.similaritySearch(\"What is the powerhouse of the cell?\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or, we can interact with the vectorstore as a retriever:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\n",
       "  Document {\n",
       "    pageContent: \u001b[32m\"Mitochondria is the powerhouse of the cell\"\u001b[39m,\n",
       "    metadata: { id: \u001b[33m1\u001b[39m }\n",
       "  },\n",
       "  Document {\n",
       "    pageContent: \u001b[32m\"Buildings are made of brick\"\u001b[39m,\n",
       "    metadata: { id: \u001b[33m2\u001b[39m }\n",
       "  }\n",
       "]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const retriever = vectorstore.asRetriever();\n",
    "\n",
    "await retriever.invoke(\"What is the powerhouse of the cell?\");"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Deno",
   "language": "typescript",
   "name": "deno"
  },
  "language_info": {
   "file_extension": ".ts",
   "mimetype": "text/x.typescript",
   "name": "typescript",
   "nb_converter": "script",
   "pygments_lexer": "typescript",
   "version": "5.3.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
